OcrV1, Main, Exploration, bibRecord, 000402

Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images

Identifieur interne : 000402 ( Main/Exploration ); précédent : 000401; suivant : 000403

Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images

Auteurs : Ram Sarkar [Inde] ; Samir Malakar [Inde] ; Nibaran Das [Inde] ; Subhadip Basu [Inde] ; Mahantapas Kundu [Inde] ; Mita Nasipuri [Inde]

Source :

Journal of Intelligent Systems [ 0334-1860 ] ; 2011-11.

RBID : ISTEX:E7E227AC3E1558DBA130EB52774013AA643A6A38

Abstract

In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcome some of the drawbacks of standard horizontal and vertical RLSA techniques. SRLSA technique has been applied on the Bangla handwritten document image database CMATERdb1.1.1 and the success rate of the word extraction is found to be 86.01%. In the second part of the work, we have presented a useful solution to the problem on how best word images of handwritten Bangla script can be segmented into constituent characters. Moreover, the technique can segment the words having discontinuity in Matra, a prominent feature of Bangla script. It also optimizes the trade-off between under/over segmentation as Matra region and segmentation points are estimated more precisely. As a result, better word segmentation accuracy is achieved with minimal data loss. Here, a success rate of 92.48% is observed on a dataset of 750 handwritten Bangla words which is 3.35% higher than that of our earlier techniques.

Url:

https://api.istex.fr/document/E7E227AC3E1558DBA130EB52774013AA643A6A38/fulltext/pdf

DOI: 10.1515/jisys.2011.013

Affiliations:

Inde

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000401
to stream Istex, to step Curation: 000394
to stream Istex, to step Checkpoint: 000058
to stream Main, to step Merge: 000407
to stream Main, to step Curation: 000402

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images</title>
<author><name sortKey="Sarkar, Ram" sort="Sarkar, Ram" uniqKey="Sarkar R" first="Ram" last="Sarkar">Ram Sarkar</name>
</author>
<author><name sortKey="Malakar, Samir" sort="Malakar, Samir" uniqKey="Malakar S" first="Samir" last="Malakar">Samir Malakar</name>
</author>
<author><name sortKey="Das, Nibaran" sort="Das, Nibaran" uniqKey="Das N" first="Nibaran" last="Das">Nibaran Das</name>
</author>
<author><name sortKey="Basu, Subhadip" sort="Basu, Subhadip" uniqKey="Basu S" first="Subhadip" last="Basu">Subhadip Basu</name>
</author>
<author><name sortKey="Kundu, Mahantapas" sort="Kundu, Mahantapas" uniqKey="Kundu M" first="Mahantapas" last="Kundu">Mahantapas Kundu</name>
</author>
<author><name sortKey="Nasipuri, Mita" sort="Nasipuri, Mita" uniqKey="Nasipuri M" first="Mita" last="Nasipuri">Mita Nasipuri</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E7E227AC3E1558DBA130EB52774013AA643A6A38</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1515/jisys.2011.013</idno>
<idno type="url">https://api.istex.fr/document/E7E227AC3E1558DBA130EB52774013AA643A6A38/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000401</idno>
<idno type="wicri:Area/Istex/Curation">000394</idno>
<idno type="wicri:Area/Istex/Checkpoint">000058</idno>
<idno type="wicri:doubleKey">0334-1860:2011:Sarkar R:word:extraction:and</idno>
<idno type="wicri:Area/Main/Merge">000407</idno>
<idno type="wicri:Area/Main/Curation">000402</idno>
<idno type="wicri:Area/Main/Exploration">000402</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images</title>
<author><name sortKey="Sarkar, Ram" sort="Sarkar, Ram" uniqKey="Sarkar R" first="Ram" last="Sarkar">Ram Sarkar</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Jadavpur University, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Malakar, Samir" sort="Malakar, Samir" uniqKey="Malakar S" first="Samir" last="Malakar">Samir Malakar</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Application, MCKV Institute of Engineering, Liluah, Howrah</wicri:regionArea>
<wicri:noRegion>Howrah</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Das, Nibaran" sort="Das, Nibaran" uniqKey="Das N" first="Nibaran" last="Das">Nibaran Das</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Jadavpur University, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Basu, Subhadip" sort="Basu, Subhadip" uniqKey="Basu S" first="Subhadip" last="Basu">Subhadip Basu</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Jadavpur University, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Kundu, Mahantapas" sort="Kundu, Mahantapas" uniqKey="Kundu M" first="Mahantapas" last="Kundu">Mahantapas Kundu</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Jadavpur University, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Nasipuri, Mita" sort="Nasipuri, Mita" uniqKey="Nasipuri M" first="Mita" last="Nasipuri">Mita Nasipuri</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Jadavpur University, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Journal of Intelligent Systems</title>
<title level="j" type="abbrev">Journal of Intelligent Systems</title>
<idno type="ISSN">0334-1860</idno>
<idno type="eISSN">2191-026X</idno>
<imprint><publisher>Walter de Gruyter GmbH & Co. KG</publisher>
<date type="published" when="2011-11">2011-11</date>
<biblScope unit="volume">20</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="227">227</biblScope>
<biblScope unit="page" to="260">260</biblScope>
</imprint>
<idno type="ISSN">0334-1860</idno>
</series>
<idno type="istex">E7E227AC3E1558DBA130EB52774013AA643A6A38</idno>
<idno type="DOI">10.1515/jisys.2011.013</idno>
<idno type="ArticleID">jisys.20.3.227</idno>
<idno type="Related-article-Href">jisys.2011.013.pdf</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0334-1860</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcome some of the drawbacks of standard horizontal and vertical RLSA techniques. SRLSA technique has been applied on the Bangla handwritten document image database CMATERdb1.1.1 and the success rate of the word extraction is found to be 86.01%. In the second part of the work, we have presented a useful solution to the problem on how best word images of handwritten Bangla script can be segmented into constituent characters. Moreover, the technique can segment the words having discontinuity in Matra, a prominent feature of Bangla script. It also optimizes the trade-off between under/over segmentation as Matra region and segmentation points are estimated more precisely. As a result, better word segmentation accuracy is achieved with minimal data loss. Here, a success rate of 92.48% is observed on a dataset of 750 handwritten Bangla words which is 3.35% higher than that of our earlier techniques.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
</country>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Sarkar, Ram" sort="Sarkar, Ram" uniqKey="Sarkar R" first="Ram" last="Sarkar">Ram Sarkar</name>
</noRegion>
<name sortKey="Basu, Subhadip" sort="Basu, Subhadip" uniqKey="Basu S" first="Subhadip" last="Basu">Subhadip Basu</name>
<name sortKey="Das, Nibaran" sort="Das, Nibaran" uniqKey="Das N" first="Nibaran" last="Das">Nibaran Das</name>
<name sortKey="Kundu, Mahantapas" sort="Kundu, Mahantapas" uniqKey="Kundu M" first="Mahantapas" last="Kundu">Mahantapas Kundu</name>
<name sortKey="Malakar, Samir" sort="Malakar, Samir" uniqKey="Malakar S" first="Samir" last="Malakar">Samir Malakar</name>
<name sortKey="Nasipuri, Mita" sort="Nasipuri, Mita" uniqKey="Nasipuri M" first="Mita" last="Nasipuri">Mita Nasipuri</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000402 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000402 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:E7E227AC3E1558DBA130EB52774013AA643A6A38
   |texte=   Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images

Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri